Overview

Dataset statistics

Number of variables21
Number of observations198481
Missing cells0
Missing cells (%)0.0%
Duplicate rows2588
Duplicate rows (%)1.3%
Total size in memory174.4 MiB
Average record size in memory921.3 B

Variable types

Boolean3
Numeric6
Categorical12

Alerts

Dataset has 2588 (1.3%) duplicate rowsDuplicates
party_sobriety is highly overall correlated with party_drug_physicalHigh correlation
party_drug_physical is highly overall correlated with party_sobrietyHigh correlation
vehicle_type is highly overall correlated with vehicle_transmissionHigh correlation
vehicle_transmission is highly overall correlated with vehicle_typeHigh correlation
direction is highly overall correlated with intersectionHigh correlation
intersection is highly overall correlated with direction and 1 other fieldsHigh correlation
weather_1 is highly overall correlated with road_surfaceHigh correlation
primary_collision_factor is highly overall correlated with pcf_violation_categoryHigh correlation
pcf_violation_category is highly overall correlated with intersection and 1 other fieldsHigh correlation
road_surface is highly overall correlated with weather_1High correlation
party_sobriety is highly imbalanced (53.9%)Imbalance
party_drug_physical is highly imbalanced (64.5%)Imbalance
cellphone_in_use is highly imbalanced (86.0%)Imbalance
vehicle_type is highly imbalanced (53.9%)Imbalance
weather_1 is highly imbalanced (68.7%)Imbalance
primary_collision_factor is highly imbalanced (97.3%)Imbalance
road_surface is highly imbalanced (74.6%)Imbalance
road_condition_1 is highly imbalanced (91.7%)Imbalance
distance is highly skewed (γ1 = 159.4162036)Skewed
insurance_premium has 33043 (16.6%) zerosZeros
vehicle_age has 147442 (74.3%) zerosZeros
distance has 41751 (21.0%) zerosZeros
collision_time has 5407 (2.7%) zerosZeros

Reproduction

Analysis started2023-11-15 15:55:42.519347
Analysis finished2023-11-15 15:56:03.679859
Duration21.16 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

at_fault
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
True
103047 
False
95434 
ValueCountFrequency (%)
True 103047
51.9%
False 95434
48.1%
2023-11-15T18:56:03.713029image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

insurance_premium
Real number (ℝ)

Distinct105
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.229841
Minimum0
Maximum105
Zeros33043
Zeros (%)16.6%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2023-11-15T18:56:03.752999image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q120
median31
Q347
95-th percentile67
Maximum105
Range105
Interquartile range (IQR)27

Descriptive statistics

Standard deviation20.686013
Coefficient of variation (CV)0.64182798
Kurtosis-0.49810277
Mean32.229841
Median Absolute Deviation (MAD)13
Skewness0.14408585
Sum6397011
Variance427.91115
MonotonicityNot monotonic
2023-11-15T18:56:03.796541image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 33043
 
16.6%
21 5642
 
2.8%
20 5482
 
2.8%
19 5389
 
2.7%
22 5335
 
2.7%
23 5082
 
2.6%
24 4657
 
2.3%
25 4528
 
2.3%
18 4467
 
2.3%
26 4334
 
2.2%
Other values (95) 120522
60.7%
ValueCountFrequency (%)
0 33043
16.6%
1 10
 
< 0.1%
2 12
 
< 0.1%
3 13
 
< 0.1%
4 15
 
< 0.1%
5 24
 
< 0.1%
6 18
 
< 0.1%
7 31
 
< 0.1%
8 22
 
< 0.1%
9 27
 
< 0.1%
ValueCountFrequency (%)
105 1
 
< 0.1%
104 1
 
< 0.1%
102 1
 
< 0.1%
101 3
 
< 0.1%
100 3
 
< 0.1%
99 7
 
< 0.1%
98 5
 
< 0.1%
97 8
 
< 0.1%
96 15
< 0.1%
95 21
< 0.1%

party_sobriety
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.3 MiB
had not been drinking
153342 
impairment unknown
18713 
not applicable
 
13458
had been drinking, under influence
 
10139
had been drinking, impairment unknown
 
1550

Length

Max length38
Median length21
Mean length21.141097
Min length14

Characters and Unicode

Total characters4196106
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhad not been drinking
2nd rowhad not been drinking
3rd rowhad not been drinking
4th rowhad not been drinking
5th rowhad not been drinking

Common Values

ValueCountFrequency (%)
had not been drinking 153342
77.3%
impairment unknown 18713
 
9.4%
not applicable 13458
 
6.8%
had been drinking, under influence 10139
 
5.1%
had been drinking, impairment unknown 1550
 
0.8%
had been drinking, not under influence 1279
 
0.6%

Length

2023-11-15T18:56:03.834290image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:03.876058image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
not 168079
22.6%
had 166310
22.4%
been 166310
22.4%
drinking 166310
22.4%
impairment 20263
 
2.7%
unknown 20263
 
2.7%
applicable 13458
 
1.8%
under 11418
 
1.5%
influence 11418
 
1.5%

Most occurring characters

ValueCountFrequency (%)
n 782315
18.6%
545348
13.0%
e 400595
9.5%
i 398022
9.5%
d 344038
8.2%
a 213489
 
5.1%
r 197991
 
4.7%
o 188342
 
4.5%
t 188342
 
4.5%
k 186573
 
4.4%
Other values (11) 751051
17.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3637790
86.7%
Space Separator 545348
 
13.0%
Other Punctuation 12968
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 782315
21.5%
e 400595
11.0%
i 398022
10.9%
d 344038
9.5%
a 213489
 
5.9%
r 197991
 
5.4%
o 188342
 
5.2%
t 188342
 
5.2%
k 186573
 
5.1%
b 179768
 
4.9%
Other values (9) 558315
15.3%
Space Separator
ValueCountFrequency (%)
545348
100.0%
Other Punctuation
ValueCountFrequency (%)
, 12968
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3637790
86.7%
Common 558316
 
13.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 782315
21.5%
e 400595
11.0%
i 398022
10.9%
d 344038
9.5%
a 213489
 
5.9%
r 197991
 
5.4%
o 188342
 
5.2%
t 188342
 
5.2%
k 186573
 
5.1%
b 179768
 
4.9%
Other values (9) 558315
15.3%
Common
ValueCountFrequency (%)
545348
97.7%
, 12968
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4196106
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 782315
18.6%
545348
13.0%
e 400595
9.5%
i 398022
9.5%
d 344038
8.2%
a 213489
 
5.1%
r 197991
 
4.7%
o 188342
 
4.5%
t 188342
 
4.5%
k 186573
 
4.4%
Other values (11) 751051
17.9%

party_drug_physical
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.8 MiB
no drugs
163558 
G
18713 
not applicable
 
13458
under drug influence
 
1516
sleepy/fatigued
 
1072

Length

Max length21
Median length8
Mean length7.8870673
Min length1

Characters and Unicode

Total characters1565433
Distinct characters23
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowno drugs
2nd rowno drugs
3rd rowno drugs
4th rowno drugs
5th rowno drugs

Common Values

ValueCountFrequency (%)
no drugs 163558
82.4%
G 18713
 
9.4%
not applicable 13458
 
6.8%
under drug influence 1516
 
0.8%
sleepy/fatigued 1072
 
0.5%
impairment - physical 164
 
0.1%

Length

2023-11-15T18:56:03.918806image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:03.962689image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
no 163558
43.2%
drugs 163558
43.2%
g 18713
 
4.9%
not 13458
 
3.6%
applicable 13458
 
3.6%
under 1516
 
0.4%
drug 1516
 
0.4%
influence 1516
 
0.4%
sleepy/fatigued 1072
 
0.3%
impairment 164
 
< 0.1%
Other values (2) 328
 
0.1%

Most occurring characters

ValueCountFrequency (%)
n 181728
11.6%
180376
11.5%
o 177016
11.3%
u 169178
10.8%
d 167662
10.7%
r 166754
10.7%
g 166146
10.6%
s 164794
10.5%
l 29668
 
1.9%
a 28316
 
1.8%
Other values (13) 133795
8.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1365108
87.2%
Space Separator 180376
 
11.5%
Uppercase Letter 18713
 
1.2%
Other Punctuation 1072
 
0.1%
Dash Punctuation 164
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 181728
13.3%
o 177016
13.0%
u 169178
12.4%
d 167662
12.3%
r 166754
12.2%
g 166146
12.2%
s 164794
12.1%
l 29668
 
2.2%
a 28316
 
2.1%
p 28316
 
2.1%
Other values (9) 85530
6.3%
Space Separator
ValueCountFrequency (%)
180376
100.0%
Uppercase Letter
ValueCountFrequency (%)
G 18713
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 1072
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 164
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1383821
88.4%
Common 181612
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 181728
13.1%
o 177016
12.8%
u 169178
12.2%
d 167662
12.1%
r 166754
12.1%
g 166146
12.0%
s 164794
11.9%
l 29668
 
2.1%
a 28316
 
2.0%
p 28316
 
2.0%
Other values (10) 104243
7.5%
Common
ValueCountFrequency (%)
180376
99.3%
/ 1072
 
0.6%
- 164
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1565433
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 181728
11.6%
180376
11.5%
o 177016
11.3%
u 169178
10.8%
d 167662
10.7%
r 166754
10.7%
g 166146
10.6%
s 164794
10.5%
l 29668
 
1.9%
a 28316
 
1.8%
Other values (13) 133795
8.5%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
False
194552 
True
 
3929
ValueCountFrequency (%)
False 194552
98.0%
True 3929
 
2.0%
2023-11-15T18:56:04.003929image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

vehicle_type
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
not applicable
142865 
sedan
34928 
coupe
18164 
hatchback
 
1579
minivan
 
909

Length

Max length14
Median length14
Mean length11.519108
Min length5

Characters and Unicode

Total characters2286324
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownot applicable
2nd rownot applicable
3rd rownot applicable
4th rownot applicable
5th rownot applicable

Common Values

ValueCountFrequency (%)
not applicable 142865
72.0%
sedan 34928
 
17.6%
coupe 18164
 
9.2%
hatchback 1579
 
0.8%
minivan 909
 
0.5%
other 36
 
< 0.1%

Length

2023-11-15T18:56:04.034462image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:04.076235image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
not 142865
41.9%
applicable 142865
41.9%
sedan 34928
 
10.2%
coupe 18164
 
5.3%
hatchback 1579
 
0.5%
minivan 909
 
0.3%
other 36
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a 324725
14.2%
p 303894
13.3%
l 285730
12.5%
e 195993
8.6%
n 179611
7.9%
c 164187
7.2%
o 161065
7.0%
i 144683
6.3%
t 144480
6.3%
b 144444
6.3%
Other values (9) 237512
10.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2143459
93.8%
Space Separator 142865
 
6.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 324725
15.1%
p 303894
14.2%
l 285730
13.3%
e 195993
9.1%
n 179611
8.4%
c 164187
7.7%
o 161065
7.5%
i 144683
6.7%
t 144480
6.7%
b 144444
6.7%
Other values (8) 94647
 
4.4%
Space Separator
ValueCountFrequency (%)
142865
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2143459
93.8%
Common 142865
 
6.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 324725
15.1%
p 303894
14.2%
l 285730
13.3%
e 195993
9.1%
n 179611
8.4%
c 164187
7.7%
o 161065
7.5%
i 144683
6.7%
t 144480
6.7%
b 144444
6.7%
Other values (8) 94647
 
4.4%
Common
ValueCountFrequency (%)
142865
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2286324
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 324725
14.2%
p 303894
13.3%
l 285730
12.5%
e 195993
8.6%
n 179611
7.9%
c 164187
7.2%
o 161065
7.0%
i 144683
6.3%
t 144480
6.3%
b 144444
6.3%
Other values (9) 237512
10.4%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
not applicable
143719 
manual
29385 
auto
25377 

Length

Max length14
Median length14
Mean length11.537044
Min length4

Characters and Unicode

Total characters2289884
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownot applicable
2nd rownot applicable
3rd rownot applicable
4th rownot applicable
5th rownot applicable

Common Values

ValueCountFrequency (%)
not applicable 143719
72.4%
manual 29385
 
14.8%
auto 25377
 
12.8%

Length

2023-11-15T18:56:04.115224image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:04.153469image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
not 143719
42.0%
applicable 143719
42.0%
manual 29385
 
8.6%
auto 25377
 
7.4%

Most occurring characters

ValueCountFrequency (%)
a 371585
16.2%
l 316823
13.8%
p 287438
12.6%
n 173104
7.6%
o 169096
7.4%
t 169096
7.4%
143719
 
6.3%
i 143719
 
6.3%
c 143719
 
6.3%
b 143719
 
6.3%
Other values (3) 227866
10.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2146165
93.7%
Space Separator 143719
 
6.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 371585
17.3%
l 316823
14.8%
p 287438
13.4%
n 173104
8.1%
o 169096
7.9%
t 169096
7.9%
i 143719
 
6.7%
c 143719
 
6.7%
b 143719
 
6.7%
e 143719
 
6.7%
Other values (2) 84147
 
3.9%
Space Separator
ValueCountFrequency (%)
143719
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2146165
93.7%
Common 143719
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 371585
17.3%
l 316823
14.8%
p 287438
13.4%
n 173104
8.1%
o 169096
7.9%
t 169096
7.9%
i 143719
 
6.7%
c 143719
 
6.7%
b 143719
 
6.7%
e 143719
 
6.7%
Other values (2) 84147
 
3.9%
Common
ValueCountFrequency (%)
143719
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2289884
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 371585
16.2%
l 316823
13.8%
p 287438
12.6%
n 173104
7.6%
o 169096
7.4%
t 169096
7.4%
143719
 
6.3%
i 143719
 
6.3%
c 143719
 
6.3%
b 143719
 
6.3%
Other values (3) 227866
10.0%

vehicle_age
Real number (ℝ)

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.3267567
Minimum0
Maximum161
Zeros147442
Zeros (%)74.3%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2023-11-15T18:56:04.184502image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile8
Maximum161
Range161
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.7494633
Coefficient of variation (CV)2.0723191
Kurtosis118.51067
Mean1.3267567
Median Absolute Deviation (MAD)0
Skewness4.0748891
Sum263336
Variance7.5595485
MonotonicityNot monotonic
2023-11-15T18:56:04.218741image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
0 147442
74.3%
3 10883
 
5.5%
4 7077
 
3.6%
2 6082
 
3.1%
5 5461
 
2.8%
6 3927
 
2.0%
7 3826
 
1.9%
8 3500
 
1.8%
9 2779
 
1.4%
1 2440
 
1.2%
Other values (10) 5064
 
2.6%
ValueCountFrequency (%)
0 147442
74.3%
1 2440
 
1.2%
2 6082
 
3.1%
3 10883
 
5.5%
4 7077
 
3.6%
5 5461
 
2.8%
6 3927
 
2.0%
7 3826
 
1.9%
8 3500
 
1.8%
9 2779
 
1.4%
ValueCountFrequency (%)
161 2
 
< 0.1%
19 1
 
< 0.1%
17 3
 
< 0.1%
16 7
 
< 0.1%
15 41
 
< 0.1%
14 284
 
0.1%
13 558
 
0.3%
12 863
0.4%
11 1360
0.7%
10 1945
1.0%

county_city_location
Real number (ℝ)

Distinct509
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2772.833
Minimum100
Maximum5802
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2023-11-15T18:56:04.262779image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile500
Q11941
median3000
Q33700
95-th percentile5200
Maximum5802
Range5702
Interquartile range (IQR)1759

Descriptive statistics

Standard deviation1306.9054
Coefficient of variation (CV)0.47132497
Kurtosis-0.37459915
Mean2772.833
Median Absolute Deviation (MAD)1058
Skewness0.15853246
Sum5.5035466 × 108
Variance1708001.8
MonotonicityNot monotonic
2023-11-15T18:56:04.304721image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1942 24276
 
12.2%
1900 7735
 
3.9%
3400 3716
 
1.9%
3711 3347
 
1.7%
1941 3137
 
1.6%
4313 2981
 
1.5%
1500 2811
 
1.4%
109 2650
 
1.3%
3001 2627
 
1.3%
3300 2470
 
1.2%
Other values (499) 142731
71.9%
ValueCountFrequency (%)
100 1332
0.7%
101 410
 
0.2%
102 60
 
< 0.1%
103 551
 
0.3%
104 240
 
0.1%
105 785
 
0.4%
106 749
 
0.4%
107 517
 
0.3%
108 168
 
0.1%
109 2650
1.3%
ValueCountFrequency (%)
5802 8
 
< 0.1%
5801 57
 
< 0.1%
5800 240
0.1%
5704 388
0.2%
5703 324
0.2%
5702 12
 
< 0.1%
5701 117
 
0.1%
5700 257
0.1%
5690 205
0.1%
5609 398
0.2%

distance
Real number (ℝ)

SKEWED  ZEROS 

Distinct2242
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean643.38082
Minimum0
Maximum1584000
Zeros41751
Zeros (%)21.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2023-11-15T18:56:04.350544image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q114
median100
Q3500
95-th percentile2640
Maximum1584000
Range1584000
Interquartile range (IQR)486

Descriptive statistics

Standard deviation8205.8205
Coefficient of variation (CV)12.75422
Kurtosis29129.625
Mean643.38082
Median Absolute Deviation (MAD)100
Skewness159.4162
Sum1.2769887 × 108
Variance67335490
MonotonicityNot monotonic
2023-11-15T18:56:04.394592image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 41751
 
21.0%
100 8117
 
4.1%
200 6698
 
3.4%
50 5628
 
2.8%
300 5594
 
2.8%
500 5086
 
2.6%
528 4786
 
2.4%
1056 4539
 
2.3%
150 3484
 
1.8%
20 3423
 
1.7%
Other values (2232) 109375
55.1%
ValueCountFrequency (%)
0 41751
21.0%
1 251
 
0.1%
1.1 7
 
< 0.1%
1.17 2
 
< 0.1%
1.2 3
 
< 0.1%
1.25 2
 
< 0.1%
1.3 4
 
< 0.1%
1.33 1
 
< 0.1%
1.4 5
 
< 0.1%
1.5 15
 
< 0.1%
ValueCountFrequency (%)
1584000 4
< 0.1%
792000 2
 
< 0.1%
549120 2
 
< 0.1%
528000 1
 
< 0.1%
316800 2
 
< 0.1%
264000 2
 
< 0.1%
171600 1
 
< 0.1%
132000 5
< 0.1%
124080 1
 
< 0.1%
81312 1
 
< 0.1%

direction
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.3 MiB
north
43646 
south
43491 
unknown
41121 
west
35255 
east
34968 

Length

Max length7
Median length5
Mean length5.0605549
Min length4

Characters and Unicode

Total characters1004424
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownorth
2nd rownorth
3rd rowunknown
4th rowsouth
5th rownorth

Common Values

ValueCountFrequency (%)
north 43646
22.0%
south 43491
21.9%
unknown 41121
20.7%
west 35255
17.8%
east 34968
17.6%

Length

2023-11-15T18:56:04.436466image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:04.479206image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
north 43646
22.0%
south 43491
21.9%
unknown 41121
20.7%
west 35255
17.8%
east 34968
17.6%

Most occurring characters

ValueCountFrequency (%)
n 167009
16.6%
t 157360
15.7%
o 128258
12.8%
s 113714
11.3%
h 87137
8.7%
u 84612
8.4%
w 76376
7.6%
e 70223
7.0%
r 43646
 
4.3%
k 41121
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1004424
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 167009
16.6%
t 157360
15.7%
o 128258
12.8%
s 113714
11.3%
h 87137
8.7%
u 84612
8.4%
w 76376
7.6%
e 70223
7.0%
r 43646
 
4.3%
k 41121
 
4.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 1004424
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 167009
16.6%
t 157360
15.7%
o 128258
12.8%
s 113714
11.3%
h 87137
8.7%
u 84612
8.4%
w 76376
7.6%
e 70223
7.0%
r 43646
 
4.3%
k 41121
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1004424
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 167009
16.6%
t 157360
15.7%
o 128258
12.8%
s 113714
11.3%
h 87137
8.7%
u 84612
8.4%
w 76376
7.6%
e 70223
7.0%
r 43646
 
4.3%
k 41121
 
4.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
False
160887 
True
37594 
ValueCountFrequency (%)
False 160887
81.1%
True 37594
 
18.9%
2023-11-15T18:56:04.519952image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

weather_1
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.3 MiB
clear
158783 
cloudy
29633 
raining
 
8267
fog
 
603
unknown
 
556
Other values (3)
 
639

Length

Max length7
Median length5
Mean length5.2363853
Min length3

Characters and Unicode

Total characters1039323
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowclear
2nd rowclear
3rd rowclear
4th rowclear
5th rowclear

Common Values

ValueCountFrequency (%)
clear 158783
80.0%
cloudy 29633
 
14.9%
raining 8267
 
4.2%
fog 603
 
0.3%
unknown 556
 
0.3%
snowing 440
 
0.2%
other 164
 
0.1%
wind 35
 
< 0.1%

Length

2023-11-15T18:56:04.749310image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:04.795491image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
clear 158783
80.0%
cloudy 29633
 
14.9%
raining 8267
 
4.2%
fog 603
 
0.3%
unknown 556
 
0.3%
snowing 440
 
0.2%
other 164
 
0.1%
wind 35
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
c 188416
18.1%
l 188416
18.1%
r 167214
16.1%
a 167050
16.1%
e 158947
15.3%
o 31396
 
3.0%
u 30189
 
2.9%
d 29668
 
2.9%
y 29633
 
2.9%
n 19117
 
1.8%
Other values (8) 29277
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1039323
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 188416
18.1%
l 188416
18.1%
r 167214
16.1%
a 167050
16.1%
e 158947
15.3%
o 31396
 
3.0%
u 30189
 
2.9%
d 29668
 
2.9%
y 29633
 
2.9%
n 19117
 
1.8%
Other values (8) 29277
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 1039323
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 188416
18.1%
l 188416
18.1%
r 167214
16.1%
a 167050
16.1%
e 158947
15.3%
o 31396
 
3.0%
u 30189
 
2.9%
d 29668
 
2.9%
y 29633
 
2.9%
n 19117
 
1.8%
Other values (8) 29277
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1039323
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 188416
18.1%
l 188416
18.1%
r 167214
16.1%
a 167050
16.1%
e 158947
15.3%
o 31396
 
3.0%
u 30189
 
2.9%
d 29668
 
2.9%
y 29633
 
2.9%
n 19117
 
1.8%
Other values (8) 29277
 
2.8%

location_type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.3 MiB
road
115303 
highway
68547 
ramp
 
10906
intersection
 
3725

Length

Max length12
Median length4
Mean length5.1862143
Min length4

Characters and Unicode

Total characters1029365
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowroad
2nd rowroad
3rd rowroad
4th rowroad
5th rowroad

Common Values

ValueCountFrequency (%)
road 115303
58.1%
highway 68547
34.5%
ramp 10906
 
5.5%
intersection 3725
 
1.9%

Length

2023-11-15T18:56:04.838186image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:04.879978image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
road 115303
58.1%
highway 68547
34.5%
ramp 10906
 
5.5%
intersection 3725
 
1.9%

Most occurring characters

ValueCountFrequency (%)
a 194756
18.9%
h 137094
13.3%
r 129934
12.6%
o 119028
11.6%
d 115303
11.2%
i 75997
 
7.4%
g 68547
 
6.7%
w 68547
 
6.7%
y 68547
 
6.7%
m 10906
 
1.1%
Other values (6) 40706
 
4.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1029365
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 194756
18.9%
h 137094
13.3%
r 129934
12.6%
o 119028
11.6%
d 115303
11.2%
i 75997
 
7.4%
g 68547
 
6.7%
w 68547
 
6.7%
y 68547
 
6.7%
m 10906
 
1.1%
Other values (6) 40706
 
4.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1029365
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 194756
18.9%
h 137094
13.3%
r 129934
12.6%
o 119028
11.6%
d 115303
11.2%
i 75997
 
7.4%
g 68547
 
6.7%
w 68547
 
6.7%
y 68547
 
6.7%
m 10906
 
1.1%
Other values (6) 40706
 
4.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1029365
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 194756
18.9%
h 137094
13.3%
r 129934
12.6%
o 119028
11.6%
d 115303
11.2%
i 75997
 
7.4%
g 68547
 
6.7%
w 68547
 
6.7%
y 68547
 
6.7%
m 10906
 
1.1%
Other values (6) 40706
 
4.0%

primary_collision_factor
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size16.5 MiB
vehicle code violation
197551 
other improper driving
 
925
fell asleep
 
5

Length

Max length22
Median length22
Mean length21.999723
Min length11

Characters and Unicode

Total characters4366527
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowvehicle code violation
2nd rowvehicle code violation
3rd rowvehicle code violation
4th rowvehicle code violation
5th rowvehicle code violation

Common Values

ValueCountFrequency (%)
vehicle code violation 197551
99.5%
other improper driving 925
 
0.5%
fell asleep 5
 
< 0.1%

Length

2023-11-15T18:56:04.914114image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:04.951223image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
vehicle 197551
33.2%
code 197551
33.2%
violation 197551
33.2%
other 925
 
0.2%
improper 925
 
0.2%
driving 925
 
0.2%
fell 5
 
< 0.1%
asleep 5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
i 595428
13.6%
e 594518
13.6%
o 594503
13.6%
396957
9.1%
v 396027
9.1%
l 395117
9.0%
c 395102
9.0%
n 198476
 
4.5%
h 198476
 
4.5%
d 198476
 
4.5%
Other values (8) 403447
9.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3969570
90.9%
Space Separator 396957
 
9.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 595428
15.0%
e 594518
15.0%
o 594503
15.0%
v 396027
10.0%
l 395117
10.0%
c 395102
10.0%
n 198476
 
5.0%
h 198476
 
5.0%
d 198476
 
5.0%
t 198476
 
5.0%
Other values (7) 204971
 
5.2%
Space Separator
ValueCountFrequency (%)
396957
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3969570
90.9%
Common 396957
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 595428
15.0%
e 594518
15.0%
o 594503
15.0%
v 396027
10.0%
l 395117
10.0%
c 395102
10.0%
n 198476
 
5.0%
h 198476
 
5.0%
d 198476
 
5.0%
t 198476
 
5.0%
Other values (7) 204971
 
5.2%
Common
ValueCountFrequency (%)
396957
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4366527
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 595428
13.6%
e 594518
13.6%
o 594503
13.6%
396957
9.1%
v 396027
9.1%
l 395117
9.0%
c 395102
9.0%
n 198476
 
4.5%
h 198476
 
4.5%
d 198476
 
4.5%
Other values (8) 403447
9.2%
Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.9 MiB
speeding
71917 
improper turning
33881 
automobile right of way
21064 
unsafe lane change
17668 
dui
17141 
Other values (16)
36810 

Length

Max length26
Median length25
Mean length13.934618
Min length3

Characters and Unicode

Total characters2765757
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspeeding
2nd rowspeeding
3rd rowdui
4th rowimproper turning
5th rowspeeding

Common Values

ValueCountFrequency (%)
speeding 71917
36.2%
improper turning 33881
17.1%
automobile right of way 21064
 
10.6%
unsafe lane change 17668
 
8.9%
dui 17141
 
8.6%
unsafe starting or backing 9322
 
4.7%
traffic signals and signs 8521
 
4.3%
following too closely 4518
 
2.3%
wrong side of road 3700
 
1.9%
unknown 3313
 
1.7%
Other values (11) 7436
 
3.7%

Length

2023-11-15T18:56:04.987734image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
speeding 71917
17.2%
improper 36819
 
8.8%
turning 33881
 
8.1%
unsafe 26990
 
6.5%
of 26551
 
6.3%
right 22851
 
5.5%
way 22851
 
5.5%
automobile 21064
 
5.0%
change 17668
 
4.2%
lane 17668
 
4.2%
Other values (28) 120129
28.7%

Most occurring characters

ValueCountFrequency (%)
e 279894
 
10.1%
n 271851
 
9.8%
i 266831
 
9.6%
219908
 
8.0%
g 193548
 
7.0%
r 173429
 
6.3%
a 164765
 
6.0%
s 158617
 
5.7%
o 157437
 
5.7%
p 150107
 
5.4%
Other values (15) 729370
26.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2545849
92.0%
Space Separator 219908
 
8.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279894
11.0%
n 271851
10.7%
i 266831
10.5%
g 193548
 
7.6%
r 173429
 
6.8%
a 164765
 
6.5%
s 158617
 
6.2%
o 157437
 
6.2%
p 150107
 
5.9%
t 116938
 
4.6%
Other values (14) 612432
24.1%
Space Separator
ValueCountFrequency (%)
219908
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2545849
92.0%
Common 219908
 
8.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279894
11.0%
n 271851
10.7%
i 266831
10.5%
g 193548
 
7.6%
r 173429
 
6.8%
a 164765
 
6.5%
s 158617
 
6.2%
o 157437
 
6.2%
p 150107
 
5.9%
t 116938
 
4.6%
Other values (14) 612432
24.1%
Common
ValueCountFrequency (%)
219908
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2765757
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 279894
 
10.1%
n 271851
 
9.8%
i 266831
 
9.6%
219908
 
8.0%
g 193548
 
7.0%
r 173429
 
6.3%
a 164765
 
6.0%
s 158617
 
5.7%
o 157437
 
5.7%
p 150107
 
5.4%
Other values (15) 729370
26.4%

road_surface
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size12.9 MiB
dry
178231 
wet
19176 
snowy
 
937
slippery
 
137

Length

Max length8
Median length3
Mean length3.0128929
Min length3

Characters and Unicode

Total characters598002
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowdry
2nd rowdry
3rd rowdry
4th rowdry
5th rowdry

Common Values

ValueCountFrequency (%)
dry 178231
89.8%
wet 19176
 
9.7%
snowy 937
 
0.5%
slippery 137
 
0.1%

Length

2023-11-15T18:56:05.022905image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:05.062348image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
dry 178231
89.8%
wet 19176
 
9.7%
snowy 937
 
0.5%
slippery 137
 
0.1%

Most occurring characters

ValueCountFrequency (%)
y 179305
30.0%
r 178368
29.8%
d 178231
29.8%
w 20113
 
3.4%
e 19313
 
3.2%
t 19176
 
3.2%
s 1074
 
0.2%
n 937
 
0.2%
o 937
 
0.2%
p 274
 
< 0.1%
Other values (2) 274
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 598002
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
y 179305
30.0%
r 178368
29.8%
d 178231
29.8%
w 20113
 
3.4%
e 19313
 
3.2%
t 19176
 
3.2%
s 1074
 
0.2%
n 937
 
0.2%
o 937
 
0.2%
p 274
 
< 0.1%
Other values (2) 274
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 598002
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
y 179305
30.0%
r 178368
29.8%
d 178231
29.8%
w 20113
 
3.4%
e 19313
 
3.2%
t 19176
 
3.2%
s 1074
 
0.2%
n 937
 
0.2%
o 937
 
0.2%
p 274
 
< 0.1%
Other values (2) 274
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 598002
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
y 179305
30.0%
r 178368
29.8%
d 178231
29.8%
w 20113
 
3.4%
e 19313
 
3.2%
t 19176
 
3.2%
s 1074
 
0.2%
n 937
 
0.2%
o 937
 
0.2%
p 274
 
< 0.1%
Other values (2) 274
 
< 0.1%

road_condition_1
Categorical

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size13.5 MiB
normal
192684 
construction
 
3215
other
 
751
obstruction
 
636
holes
 
589
Other values (3)
 
606

Length

Max length14
Median length6
Mean length6.1260221
Min length5

Characters and Unicode

Total characters1215899
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rownormal
2nd rownormal
3rd rownormal
4th rownormal
5th rownormal

Common Values

ValueCountFrequency (%)
normal 192684
97.1%
construction 3215
 
1.6%
other 751
 
0.4%
obstruction 636
 
0.3%
holes 589
 
0.3%
loose material 277
 
0.1%
reduced width 223
 
0.1%
flooded 106
 
0.1%

Length

2023-11-15T18:56:05.095954image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:05.138577image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
normal 192684
96.8%
construction 3215
 
1.6%
other 751
 
0.4%
obstruction 636
 
0.3%
holes 589
 
0.3%
loose 277
 
0.1%
material 277
 
0.1%
reduced 223
 
0.1%
width 223
 
0.1%
flooded 106
 
0.1%

Most occurring characters

ValueCountFrequency (%)
o 202492
16.7%
n 199750
16.4%
r 197786
16.3%
l 193933
15.9%
a 193238
15.9%
m 192961
15.9%
t 8953
 
0.7%
c 7289
 
0.6%
s 4717
 
0.4%
i 4351
 
0.4%
Other values (8) 10429
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1215399
> 99.9%
Space Separator 500
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 202492
16.7%
n 199750
16.4%
r 197786
16.3%
l 193933
16.0%
a 193238
15.9%
m 192961
15.9%
t 8953
 
0.7%
c 7289
 
0.6%
s 4717
 
0.4%
i 4351
 
0.4%
Other values (7) 9929
 
0.8%
Space Separator
ValueCountFrequency (%)
500
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1215399
> 99.9%
Common 500
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 202492
16.7%
n 199750
16.4%
r 197786
16.3%
l 193933
16.0%
a 193238
15.9%
m 192961
15.9%
t 8953
 
0.7%
c 7289
 
0.6%
s 4717
 
0.4%
i 4351
 
0.4%
Other values (7) 9929
 
0.8%
Common
ValueCountFrequency (%)
500
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1215899
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 202492
16.7%
n 199750
16.4%
r 197786
16.3%
l 193933
15.9%
a 193238
15.9%
m 192961
15.9%
t 8953
 
0.7%
c 7289
 
0.6%
s 4717
 
0.4%
i 4351
 
0.4%
Other values (8) 10429
 
0.9%

lighting
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.7 MiB
daylight
134738 
dark with street lights
42292 
dark with no street lights
14232 
dusk or dawn
 
6775
dark with street lights not functioning
 
444

Length

Max length39
Median length8
Mean length12.692741
Min length8

Characters and Unicode

Total characters2519268
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowdark with street lights
2nd rowdark with street lights
3rd rowdaylight
4th rowdaylight
5th rowdark with street lights

Common Values

ValueCountFrequency (%)
daylight 134738
67.9%
dark with street lights 42292
 
21.3%
dark with no street lights 14232
 
7.2%
dusk or dawn 6775
 
3.4%
dark with street lights not functioning 444
 
0.2%

Length

2023-11-15T18:56:05.182226image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-15T18:56:05.224580image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
daylight 134738
33.8%
dark 56968
14.3%
with 56968
14.3%
street 56968
14.3%
lights 56968
14.3%
no 14232
 
3.6%
dusk 6775
 
1.7%
or 6775
 
1.7%
dawn 6775
 
1.7%
not 444
 
0.1%

Most occurring characters

ValueCountFrequency (%)
t 363498
14.4%
i 249562
9.9%
h 248674
9.9%
d 205256
8.1%
199574
7.9%
a 198481
7.9%
g 192150
7.6%
l 191706
7.6%
y 134738
 
5.3%
s 120711
 
4.8%
Other values (9) 414918
16.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2319694
92.1%
Space Separator 199574
 
7.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 363498
15.7%
i 249562
10.8%
h 248674
10.7%
d 205256
8.8%
a 198481
8.6%
g 192150
8.3%
l 191706
8.3%
y 134738
 
5.8%
s 120711
 
5.2%
r 120711
 
5.2%
Other values (8) 294207
12.7%
Space Separator
ValueCountFrequency (%)
199574
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2319694
92.1%
Common 199574
 
7.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 363498
15.7%
i 249562
10.8%
h 248674
10.7%
d 205256
8.8%
a 198481
8.6%
g 192150
8.3%
l 191706
8.3%
y 134738
 
5.8%
s 120711
 
5.2%
r 120711
 
5.2%
Other values (8) 294207
12.7%
Common
ValueCountFrequency (%)
199574
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2519268
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 363498
14.4%
i 249562
9.9%
h 248674
9.9%
d 205256
8.1%
199574
7.9%
a 198481
7.9%
g 192150
7.6%
l 191706
7.6%
y 134738
 
5.3%
s 120711
 
4.8%
Other values (9) 414918
16.5%

collision_time
Real number (ℝ)

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.842882
Minimum0
Maximum23
Zeros5407
Zeros (%)2.7%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2023-11-15T18:56:05.261663image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q19
median14
Q317
95-th percentile21
Maximum23
Range23
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.6884002
Coefficient of variation (CV)0.44292241
Kurtosis-0.51959289
Mean12.842882
Median Absolute Deviation (MAD)4
Skewness-0.39087277
Sum2549068
Variance32.357896
MonotonicityNot monotonic
2023-11-15T18:56:05.296253image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
15 15368
 
7.7%
17 15202
 
7.7%
16 14290
 
7.2%
18 12897
 
6.5%
14 12772
 
6.4%
8 11741
 
5.9%
13 11161
 
5.6%
7 11094
 
5.6%
12 10970
 
5.5%
11 9239
 
4.7%
Other values (14) 73747
37.2%
ValueCountFrequency (%)
0 5407
2.7%
1 3806
 
1.9%
2 4020
 
2.0%
3 2590
 
1.3%
4 2007
 
1.0%
5 3005
 
1.5%
6 5211
2.6%
7 11094
5.6%
8 11741
5.9%
9 8755
4.4%
ValueCountFrequency (%)
23 4518
 
2.3%
22 5266
 
2.7%
21 6117
 
3.1%
20 6563
3.3%
19 8171
4.1%
18 12897
6.5%
17 15202
7.7%
16 14290
7.2%
15 15368
7.7%
14 12772
6.4%

month
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.0637744
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.0 MiB
2023-11-15T18:56:05.329180image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile5
Maximum12
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.6234079
Coefficient of variation (CV)0.52987188
Kurtosis2.0989853
Mean3.0637744
Median Absolute Deviation (MAD)1
Skewness0.86431594
Sum608101
Variance2.6354532
MonotonicityNot monotonic
2023-11-15T18:56:05.359915image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
3 41566
20.9%
1 40844
20.6%
2 38907
19.6%
4 37395
18.8%
5 32586
16.4%
6 4100
 
2.1%
8 858
 
0.4%
9 676
 
0.3%
7 563
 
0.3%
10 386
 
0.2%
Other values (2) 600
 
0.3%
ValueCountFrequency (%)
1 40844
20.6%
2 38907
19.6%
3 41566
20.9%
4 37395
18.8%
5 32586
16.4%
6 4100
 
2.1%
7 563
 
0.3%
8 858
 
0.4%
9 676
 
0.3%
10 386
 
0.2%
ValueCountFrequency (%)
12 286
 
0.1%
11 314
 
0.2%
10 386
 
0.2%
9 676
 
0.3%
8 858
 
0.4%
7 563
 
0.3%
6 4100
 
2.1%
5 32586
16.4%
4 37395
18.8%
3 41566
20.9%

Interactions

2023-11-15T18:56:01.567636image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:52.979274image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:54.155583image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:55.413385image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:59.324420image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:00.371042image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:01.615025image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:53.037613image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:54.203414image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:55.941019image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:59.373735image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:00.419020image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:01.738355image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:53.278427image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:54.323988image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:56.641504image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:59.502923image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:00.716324image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:02.457616image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:54.004084image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:55.181605image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:57.737160image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:00.224037image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:01.425945image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:02.505241image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:54.057400image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:55.238408image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:58.379337image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:00.275823image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:01.475957image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:02.550675image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:54.106176image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:55.289419image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:55:58.889939image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:00.321161image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-11-15T18:56:01.521775image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2023-11-15T18:56:05.397721image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
insurance_premiumvehicle_agecounty_city_locationdistancecollision_timemonthat_faultparty_sobrietyparty_drug_physicalcellphone_in_usevehicle_typevehicle_transmissiondirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lighting
insurance_premium1.0000.1570.0330.0230.0070.0060.1550.3730.3650.0450.1180.1780.0380.0920.0180.1290.0240.0900.0220.0120.114
vehicle_age0.1571.0000.0260.0310.0420.0440.0000.0000.0000.0000.0070.0060.0000.0030.0000.0030.0000.0000.0020.0000.004
county_city_location0.0330.0261.0000.0340.0080.0060.0910.1120.1020.3940.0980.1230.1970.2280.1610.2600.0810.1130.2620.1140.151
distance0.0230.0310.0341.000-0.0400.0230.0000.0050.0000.0000.0000.0000.0030.0000.0000.0000.0000.0040.0000.0000.000
collision_time0.0070.0420.008-0.0401.0000.0050.0730.1850.1140.0070.0430.0600.0320.0640.0520.0810.0110.1340.0420.0180.448
month0.0060.0440.0060.0230.0051.0000.0050.0210.0580.0120.0630.0730.0130.0250.0660.0220.0350.0320.0930.0150.074
at_fault0.1550.0000.0910.0000.0730.0051.0000.4090.3530.0160.1450.0430.0150.0170.0340.0330.0050.1220.0460.0150.068
party_sobriety0.3730.0000.1120.0050.1850.0210.4091.0000.6330.0470.1170.1670.0330.0830.0190.1350.0240.3300.0160.0160.169
party_drug_physical0.3650.0000.1020.0000.1140.0580.3530.6331.0000.0390.1080.1580.0310.0790.0180.1260.0240.1550.0170.0160.103
cellphone_in_use0.0450.0000.3940.0000.0070.0120.0160.0470.0391.0000.0070.0060.0110.0080.0110.0170.0020.0220.0190.0050.002
vehicle_type0.1180.0070.0980.0000.0430.0630.1450.1170.1080.0071.0000.7090.0420.0900.0130.0620.0080.2390.0160.0110.034
vehicle_transmission0.1780.0060.1230.0000.0600.0730.0430.1670.1580.0060.7091.0000.0290.0530.0100.0290.0080.0980.0110.0110.038
direction0.0380.0000.1970.0030.0320.0130.0150.0330.0310.0110.0420.0291.0000.9410.0230.2270.0080.2790.0220.0270.050
intersection0.0920.0030.2280.0000.0640.0250.0170.0830.0790.0080.0900.0530.9411.0000.0310.3730.0090.5770.0260.0430.094
weather_10.0180.0000.1610.0000.0520.0660.0340.0190.0180.0110.0130.0100.0230.0311.0000.0440.0090.0320.5510.0440.040
location_type0.1290.0030.2600.0000.0810.0220.0330.1350.1260.0170.0620.0290.2270.3730.0441.0000.0280.2800.0400.0640.107
primary_collision_factor0.0240.0000.0810.0000.0110.0350.0050.0240.0240.0020.0080.0080.0080.0090.0090.0281.0001.0000.0070.0000.007
pcf_violation_category0.0900.0000.1130.0040.1340.0320.1220.3300.1550.0220.2390.0980.2790.5770.0320.2801.0001.0000.0560.0330.163
road_surface0.0220.0020.2620.0000.0420.0930.0460.0160.0170.0190.0160.0110.0220.0260.5510.0400.0070.0561.0000.0990.035
road_condition_10.0120.0000.1140.0000.0180.0150.0150.0160.0160.0050.0110.0110.0270.0430.0440.0640.0000.0330.0991.0000.027
lighting0.1140.0040.1510.0000.4480.0740.0680.1690.1030.0020.0340.0380.0500.0940.0400.1070.0070.1630.0350.0271.000

Missing values

2023-11-15T18:56:02.805685image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-15T18:56:03.215092image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

at_faultinsurance_premiumparty_sobrietyparty_drug_physicalcellphone_in_usevehicle_typevehicle_transmissionvehicle_agecounty_city_locationdistancedirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lightingcollision_timemonth
0False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.0370230.0northFalseclearroadvehicle code violationspeedingdrynormaldark with street lights111
1False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.0370230.0northFalseclearroadvehicle code violationspeedingdrynormaldark with street lights111
2False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.019610.0unknownFalseclearroadvehicle code violationduidrynormaldaylight124
3False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.04310680.0southFalseclearroadvehicle code violationimproper turningdrynormaldaylight92
4False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.0370230.0northFalseclearroadvehicle code violationspeedingdrynormaldark with street lights111
5False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.0194170.0eastFalseclearroadvehicle code violationunknowndrynormaldusk or dawn64
6False64.0had not been drinkingno drugsFalsehatchbackauto10.03711289.0southFalseclearroadvehicle code violationspeedingdrynormaldaylight72
7False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.03616650.0eastFalserainingroadvehicle code violationunknownwetnormaldark with street lights204
8False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.0431230.0southFalsecloudyroadvehicle code violationunsafe starting or backingdrynormaldaylight143
9False0.0not applicablenot applicableFalsenot applicablenot applicable0.02900432.0northFalseclearroadvehicle code violationspeedingdrynormaldark with no street lights21
at_faultinsurance_premiumparty_sobrietyparty_drug_physicalcellphone_in_usevehicle_typevehicle_transmissionvehicle_agecounty_city_locationdistancedirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lightingcollision_timemonth
198471False39.0had not been drinkingno drugsFalsesedanmanual14.030261900.0southFalseclearhighwayvehicle code violationunsafe lane changedrynormaldaylight91
198472True50.0had not been drinkingno drugsFalsenot applicablenot applicable0.033131584.0westFalseclearhighwayvehicle code violationspeedingdrynormaldark with no street lights171
198473False23.0had not been drinkingno drugsFalsenot applicablenot applicable0.033131584.0westFalseclearhighwayvehicle code violationspeedingdrynormaldark with no street lights171
198474True27.0had not been drinkingno drugsFalsenot applicablenot applicable0.0302650.0northFalseclearhighwayvehicle code violationspeedingdrynormaldaylight151
198475False25.0had not been drinkingno drugsFalsenot applicablenot applicable0.0302650.0northFalseclearhighwayvehicle code violationspeedingdrynormaldaylight151
198476False39.0had not been drinkingno drugsFalsenot applicablenot applicable0.0302650.0northFalseclearhighwayvehicle code violationspeedingdrynormaldaylight151
198477True0.0impairment unknownGFalsenot applicablenot applicable0.0330066.0westFalseclearroadvehicle code violationunsafe starting or backingdrynormaldark with street lights221
198478True24.0had been drinking, under influenceno drugsFalsenot applicablenot applicable0.0331315.0southFalseclearhighwayvehicle code violationduidrynormaldark with street lights181
198479False59.0had not been drinkingno drugsFalsenot applicablenot applicable0.0331315.0southFalseclearhighwayvehicle code violationduidrynormaldark with street lights181
198480True27.0had not been drinkingno drugsFalsesedanmanual1.03394133.0northFalseclearroadvehicle code violationunsafe starting or backingdrynormaldaylight151

Duplicate rows

Most frequently occurring

at_faultinsurance_premiumparty_sobrietyparty_drug_physicalcellphone_in_usevehicle_typevehicle_transmissionvehicle_agecounty_city_locationdistancedirectionintersectionweather_1location_typeprimary_collision_factorpcf_violation_categoryroad_surfaceroad_condition_1lightingcollision_timemonth# duplicates
755False0.0not applicablenot applicableFalsenot applicablenot applicable0.0192094.0eastFalseclearroadvehicle code violationduidrynormaldark with street lights3211
441False0.0not applicablenot applicableFalsenot applicablenot applicable0.001090.0unknownTrueclearroadvehicle code violationimproper turningdrynormaldusk or dawn648
1979False0.0not applicablenot applicableFalsenot applicablenot applicable0.04203159.0southFalseclearroadvehicle code violationimproper turningdrynormaldaylight947
291False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.038010.0unknownTrueclearroadvehicle code violationother equipmentdryconstructiondark with street lights186
406False0.0not applicablenot applicableFalsenot applicablenot applicable0.0010136.0westFalserainingroadvehicle code violationspeedingwetnormaldaylight926
1237False0.0not applicablenot applicableFalsenot applicablenot applicable0.01942251.0southFalseclearroadvehicle code violationspeedingdrynormaldaylight1656
1282False0.0not applicablenot applicableFalsenot applicablenot applicable0.01942390.0westFalseclearroadvehicle code violationduidrynormaldark with street lights346
2082False0.0not applicablenot applicableFalsenot applicablenot applicable0.0490548.0southFalseclearroadvehicle code violationduidrynormaldark with street lights2216
35False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.0100847.0westFalseclearhighwayvehicle code violationimproper turningdrynormaldaylight1245
91False0.0had not been drinkingno drugsFalsenot applicablenot applicable0.01941100.0eastFalseunknownroadvehicle code violationunsafe lane changedrynormaldaylight1045